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Abstract 

Random  variables  can  be  described  by  their  cumulative  distribution  functions,  a  class 
of  nondecreasing  functions  on  the  real  line.  Those  functions  can  in  turn  be  identified,  after 
the  possible  vertical  gaps  in  their  graphs  are  filled  in,  with  maximal  monotone  relations. 
Such  relations  are  known  to  be  the  subdifferentials  of  convex  functions. 

Analysis  of  these  connections  yields  new  insights.  The  generalized  inversion  operation 
between  distribution  functions  and  quantile  functions  corresponds  to  graphical  inversion  of 
monotone  relations.  In  subdifferential  terms,  it  corresponds  to  passing  to  conjugate  convex 
functions  under  the  Legendre-Fenchel  transform.  Among  other  things,  this  shows  that 
convergence  in  distribution  for  sequences  of  random  variables  is  equivalent  to  graphical 
convergence  of  the  monotone  relations  and  epigraphical  convergence  of  the  associated 
convex  functions. 

Measures  of  risk  that  employ  quantiles  (VaR)  and  superquantiles  (CVaR),  either  in¬ 
dividually  or  in  mixtures,  are  illuminated  in  this  way.  Formulas  for  their  calculation  are 
seen  from  a  perspective  that  reveals  how  they  were  discovered.  The  approach  leads  fur¬ 
ther  to  developments  in  which  the  superquantiles  for  a  given  distribution  are  interpreted 
as  the  quantiles  for  an  overlying  “superdistribution.”  In  this  way  a  generalization  of 
Koenker-Basset  error  is  derived  which  lays  a  foundation  for  superquantile  regression  as  a 
higher-order  extension  of  quantile  regression. 
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1  Introduction 


The  aim  of  this  article  is  to  promote  a  way  of  looking  at  fundamental  concepts  in  probability 
and  statistics  by  embedding  them  in  a  framework  of  convex  analysis.  The  key  is  a  thorough 
duality  between  cumulative  distribution  functions  on  (— oo,  oo)  and  quantile  functions  on  (0, 1), 
based  on  identifing  them  with  the  one-sided  derivatives  of  conjugate  pairs  of  convex  functions. 

Motivation  for  this  framework  comes  from  the  modeling  of  risk  in  optimization  under  uncer¬ 
tainty,  along  with  applications  to  stochastic  estimation  and  approximation.  Sharply  in  focus, 
beyond  distribution  functions  and  quantiles  are  “superquantiles,”  which  are  quantifications  of 
random  variables  now  recognized  as  essential  building-blocks  for  “measures  of  risk”  in  finance 
and  engineering.  Superquantiles  fit  most  simply  and  naturally  with  random  variables  having 
a  cost /loss /damage  orientation,  in  tune  with  the  conventions  of  optimization  theory  in  which 
functions  are  minimized  and  inequality  constraints  are  normalized  to  “<”  form.  The  upper  tails 
of  the  distributions  of  such  random  variables  are  usually  then  of  more  concern  than  the  lower 
tails.  Corresponding  adjustments  in  formulation  and  terminology  from  previous  work  having 
the  opposite  orientation  is  one  of  our  ongoing  themes  here.  First-  and  second-order  stochastic 
dominance  are  adapted  to  this  perspective,  in  particular. 

A  further  benefit  of  the  convex  analysis  framework  is  new  characterizations  of  convergence 
in  distribution,  a  widely  used  property  of  approximation.  Our  analysis  indicates  moreover 
how  quantile  regression,  as  an  alternative  to  least-squares  regression  in  statistics,  can  be  boot¬ 
strapped  into  a  new  higher-order  approximation  tool  centered  instead  on  superquantiles.  Help¬ 
ful  estimates  of  superquantiles,  for  numerical  work  and  more,  are  derived  as  well.  Second- 
derivative  duality  in  convex  analysis  further  produces  a  duality  between  distribution  densities 
and  quantile  densities. 

Distribution  functions  versus  quantile  functions.  The  path  to  these  developments 
begins  with  elementary  observations  in  a  two-dimensional  graphical  setting  with  pairs  of  non- 
decreasing  functions  in  an  extended  inverse-like  relationship. 

A  real-valued  random  variable  X  gives  a  probability  measure  on  the  real  line  JR  which  can 
be  described  by  the  (cumulative)  distribution  function  Fx  for  A",  namely 

Fx(x)  =  probjX  <x}  for  x  G  (— oo,  oo).  (1.1) 

The  function  Fx  is  nondecreasing  and  right-continuous  on  (— oo,oo),  and  it  tends  to  0  as 
x  —>  —  oo  and  to  1  as  x  — »  oo.  These  properties  characterize  the  class  of  functions  that  furnish 
distributions  of  random  variables.  Right-continuity  of  a  nondecreasing  function  reduces  to 
continuity  except  at  points  where  the  graph  has  a  vertical  gap.  The  set  of  such  jump  points,  if 
any,  has  to  be  finite  or  countably  infinite. 

The  probability  measure  associated  with  a  random  variable  X  can  alternatively  be  described 
by  its  quantile  function  Qx,  namely 

Qx(p)  =  nhn{ x  \  Fx(x)  >p}  for  p  G  (0, 1),  (1.2) 

so  that  Qx(p)  is  the  lowest  x  such  that  prob  {X  >  x}  <  1  —p.  The  function  Qx  is  nondecreasing 
and  left-continuous  on  (0, 1),  and  those  properties  characterize  the  class  of  functions  that  furnish 
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Figure  1:  distribution  function  F\  and  quantile  function  Qx 

quantiles  of  random  variables.  The  correspondence  between  distribution  functions  and  quantile 
functions  is  one-to-one,  with  Fx  recoverable  from  Qx  by 

{max{  p  |  Q x  (p)  <  x  }  for  x  G  (inf  Qx ,  sup  Qx\ , 

1  for  x  >  sup  Qx,  (1.3) 

0  for  x  <  inf  QX- 

The  quantile  function  Qx,  like  the  distribution  function  Fx,  can  have  at  most  countably  many 
jumps  where  if  fails  to  be  continuous.  The  vertical  gaps  in  the  graph  of  Qx  correspond  to  the 
horizontal  segments  in  the  graph  of  Fx,  and  vice  versa,  as  seen  in  Figure  1.  It  follows  that  Fx 
and  Qx  can  likewise  have  at  most  countably  many  horizontal  segments  in  their  graphs. 

When  the  graph  of  Fx  has  no  vertical  gaps  or  horizontal  segments,  so  that  Fx  is  not  only 
continuous  but  (strictly)  increasing,  the  “min”  in  (1.2)  is  superfluous  and  Qx(p)  is  the  unique 
solution  x  to  Fx(x)  =  p.  Then  Qx  is  just  the  inverse  FQ1  of  Fx  on  (0,1).  Without  such 
restriction,  though,  one  can  only  count  on  Qx{Fx(x  j)  >  x  and  Fx(Qx{p ))  <  P,  along  with 

Fx{x)>p  Qx(p)<x.  (1.4) 

The  generalized  inversion  represented  in  (1.4)  and  formulas  (1.2)  and  (1.3)  can  be  given 
a  solid  footing  in  geometry.  By  filling  in  the  vertical  gaps  in  the  graphs  of  Fx  and  Qx,  and 

adding  infinite  vertical  segments  at  the  right  and  left  ends  of  the  resulting  “curve”  for  Qx  to 

mimic  the  infinite  horizontal  segments  which  appear  at  the  ends  of  the  graph  of  Fx  when  the 
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Figure  2:  the  relation  Tx  and  its  inverse  Ay 

range  of  X  is  bounded  from  above  or  below,  one  obtains  the  “curves”  Tx  and  Ax  of  Figure  2. 
These  “curves”  are  the  reflections  of  each  other  across  the  45-degree  line  in  ]Rx  ]R  where  x  =  p. 

In  the  classical  mindset  it  would  be  anathema  to  fill  in  vertical  gaps  in  the  graph  of  a 
function,  thereby  ruining  its  status  as  a  “function”  (single- val ued) .  In  this  situation,  though, 
there  are  overriding  advantages.  The  graphs  Tx  and  Ay  belong  to  a  class  of  subsets  of  ]R2 
called  maximal  monotone  relations.  Such  relations  have  powerful  properties  and  are  basic  to 
convex  analysis,  which  identifies  them  with  the  “subdifferentials”  of  convex  functions.  This  will 
be  recalled  in  Section  2.  The  graphical  inversion  in  Figure  2,  where 

Ax  =  {  (p,  x)  |  (x,p)  G  Tx  },  rx  =  {  (x,p)  |  (p,  x)  G  Ax  }, 

will  be  portrayed  there  as  corresponding  to  the  Legendre-Fenchel  transform,  which  dualizes  a 
convex  function  by  pairing  it  with  a  conjugate  convex  function. 

Although  monotone  relations  are  central  in  this  paper,  the  idea  of  looking  at  conjugate  pairs 
of  convex  functions  defined  in  one  way  or  another  through  direct  integration  of  Fx  and  Qx  is 
not  new,  cf.  Ogryczak  and  Ruszczynski  [14]  and  subsequently  [15],  [16].  What  is  different  here 
is  a  choice  of  functions  that  better  suits  random  variables  with  cost/loss/damage  orientation  in 
handling  their  upper  tails.  The  need  of  such  a  switch  for  purposes  in  stochastic  optimization 
has  recently  motivated  Dentcheva  and  Martinez  [4]  to  adapt  also  in  that  direction,  but  our 
approach  seems  to  achieve  that  more  simply  and  comprehensively. 

The  convergence  theory  for  maximal  monotone  relations  and  the  convex  functions  having 
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them  as  their  sub  differentials  can  be  coordinated  in  our  framework  with  results  about  the  con¬ 
vergence  of  sequences  of  random  variables.  A  special  feature  is  that  approximations  on  the  side 
of  convex  analysis  are  most  effectively  studied  through  “set  convergence.”  Maximal  monotone 
relations  are  compared  in  terms  of  distances  to  their  graphs,  while  convex  functions  are  com¬ 
pared  in  terms  of  distances  to  their  epigraphs  rather  than  their  graphs.  Indeed,  epigraphical 
convergence  of  convex  functions  is  tantamount  to  graphical  convergence  of  their  subdifferentials. 
Epigraphical  convergence  is  the  only  topological  convergence  that  renders  the  Legendre-Fenchcl 
transform  continuous. 

For  a  sequence  of  random  variables  Xk  this  leads,  as  we  demonstrate  in  Section  3,  to 
fresh  characterizations  of  “convergence  in  distribution”  of  Xk  to  a  random  variable  X.  It 
corresponds  to  graphical  convergence  of  the  associated  maximal  monotone  relations  Y  xk  to 
Yx,  or  for  that  matter  A_Yfc  to  Ax,  and  to  epigraphical  convergence  of  the  convex  functions 
having  their  subdifferentials  described  by  those  relations.  That  epigraphical  convergence,  in 
this  special  context,  can  essentially  be  reduced  to  pointwise  convergence. 

Superquantile  functions.  In  associating  with  a  convex  function  having  it  as  subdif¬ 
ferential,  and  then  investigating  the  conjugate  of  that  convex  function,  information  is  gained 
about  superquantiles  of  random  variables.  Superquantiles  refer  to  values  which,  like  quantiles, 
capture  all  the  information  about  the  distribution  of  a  random  variable,  but  in  doing  that  avoid 
some  of  the  troublesome  properties  of  quantiles  such  as  potential  discontinuity  and  instability 
with  respect  to  parameterization.  They  have  been  studied  under  different  names  for  model¬ 
ing  risk  in  finance,  but  here  we  are  translating  them  to  the  general  theory  of  statistics  and 
probability.  Bringing  out  their  significance  in  that  environment  is  one  of  our  goals. 

For  a  random  variable  A"  with  cost /loss/damage  orientation,  the  superquantile  Qx(p )  at 
probability  level  p  G  (0,1)  has  two  equivalent  expressions  which  look  quite  different.  First, 

Qx(p)  =  expectation  in  the  (upper)  p-tail  distribution  of  A".  (1.5) 

This  refers  to  the  probability  distribution  on  [ Qx(p ),  oo)  which,  in  the  case  of  Fx(Qx(p))  =  P, 
is  the  conditional  distribution  of  X  subject  X  >  Qx(p),  but  which  “rectifies”  that  conditional 
distribution  when  Fx  has  a  jump  at  the  quantile  Qx(p),  so  that  Fx(Qx(p))  >  P ■  In  the  latter 
case  there  is  a  probability  atom  at  Qx(p)  causing  the  interval  [Qx(p),  oo)  to  have  probability 
larger  than  1  —  p  and  the  interval  (Qx(p),  oo)  to  have  probability  smaller  than  1  —  p.  To 
take  care  of  the  discrepancy,  the  p-tail  distribution  is  defined  in  general  as  having  Fx\x)  = 
max{0,Fx(a;)  —  p)/{l—p)  as  its  distribution.  This  amounts  to  an  appropriate  splitting  of  the 
probability  atom  at  Qx(p)-  The  second  expression  for  the  superquantile  is 

Qx(p)  =  f  Qx{p')dp' ■  (1.6) 

1  -pJp 

The  equivalence  between  the  two  expressions  will  be  explained  in  Section  3,  which  will  also 
clarify  the  restrictions  that  need  to  be  imposed  to  ensure  both  are  well  defined. 

In  finance,  the  quantile  Qx(p)  is  identical  to  the  popular  notion  of  the  value-at-risk  VaRp(A") 
of  X  at  probability  level  p.3  The  superquantile  Qx(p)  as  defined  by  (1.5)  goes  back  to  Rockafel- 
lar  and  Uryasev  [22]  (and  an  earlier  working  paper  of  1999),  with  follow-up  in  [23].  There  it  was 

3This  is  the  case  for  cost/loss-oriented  random  variables.  In  applications  centered  on  random  variables  Y 
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Figure  3:  the  pth  quantile  and  the  p-tail 


called  conditional-value-at-risk ,  as  suggested  by  (1.5)  and  its  interpretation  as  the  conditional 
expectation  of  X  subject  to  X  >  Qx(p)  when  Fx  has  no  jump  at  the  quantile  Qx(p)-  If  was 
denoted  by  CVaRp(X)  in  order  to  contrast  it  with  VaRp(X).  The  expression  on  the  right  side 
of  (1.6)  was  independently  introduced  around  the  same  time  by  Acerbi  [1]  as  “expected  short¬ 
fall,”* * * 4  but  the  equivalence  of  the  two  was  soon  realized.  Because  statistical  terminology  ought 
to  be  free  from  dependence  on  financial  terminology,  we  think  it  helpful  to  have  “superquantile” 
available  as  a  neutral  alternative  name.  This  was  suggested  in  our  paper  [20]  on  reliability  in 
engineering  and  has  been  pursued  further  in  the  “risk  quadrangle”  setting  of  [24], 

That  side-by-side  approach  advantageously  suggests  making  a  graphical  comparison  between 
the  superquantile  function  Qx  and  the  quantile  function  Qx ,  as  in  Figure  4.  A  new  and 
immediate  insight  is  that  Qx  is  the  inverse  of  a  distribution  function  Fx  generated  from  Fx. 
We  call  this  the  corresponding  superdistribution  function.  It  lets  the  superquantiles  of  X  be 
identified  as  the  quantiles  for  an  auxiliary  probability  measure  on  (— oo,  oo).  Specifically,  Fx 
is  the  distribution  function  for  an  auxiliary  random  variable  X  derived  from  A",  and  this  will 

that  are  profit /gain-oriented,  the  value-at-risk  of  Y  at  probability  level  p  is  —  <3[_y](l  —  p).  The  avoidance  of 

such  complications  with  minus  signs  is  one  of  the  reasons  why  we  prefer  cost /loss  orientation  in  setting  forth 

principles  for  use  in  statistics  and  probability  with  applications  to  optimization. 

4The  interpretation  of  this  integral  as  an  average  led  Follmer  and  Schied  [7]  to  instead  call  this  quantity 
“average  value-at-risk”  with  notation  AVaR. 
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Figure  4:  superdistribution  function  F x  and  superquantile  function  Qx 
have  valuable  consequences: 

F x  =  Fx  for  the  random  variable  A"  =  Qx(Fx(X)).  (1.7) 

Motivating  connections  with  risk.  Although  superquantiles  have  not  previously  been 
touted  as  a  potentially  significant  addition  to  basic  statistics,  their  importance  in  formulating 
problems  of  stochastic  optimization  is  already  well  recognized.  A  brief  discussion  of  “risk”  will 
help  in  understanding  the  interest  in  them  coming  from  that  direction. 

A  measure  of  risk  is  a  functional  1Z  that  assigns  to  a  random  variable  A"  a  value  1Z(X) 
in  ( — oo,  oo]  as  a  quantification  of  the  risk  in  it.'5  The  context  here  is  that  of  X  representing 
a  generalized  “cost,”  “loss”  or  “damage”  index,  meaning  that  lower  values  are  preferred  to 
higher  values.  Typically  it  is  desired  to  have  the  outcomes  of  X  below  a  threshold  b,  but  some 
violations  may  have  to  be  accepted.  For  instance,  it  would  be  nice  if  the  losses  for  a  given 
portfolio  of  financial  assets  were  always  <  0,  but  arranging  for  that  might  not  be  feasible.  How 
then  can  trade-off  preferences  be  captured?  How  can  the  desire  to  have  X  be  “adequately”  <  b 
in  its  outcomes  be  given  a  mathematical  formulation?  The  role  of  a  risk  measure  1Z  is  to  model 
this  as  TZ(X)  <  b. 

Specific  examples  can  help  in  appreciating  the  issues.  In  taking  7Z(X)  =  E[ X]  (expectation), 
the  interpretation  of  'JZ(X)  <  b  is  that  the  outcomes  of  A"  are  <  b  “on  average.”  That  choice 

5Measures  of  risk  are  not  “measures”  in  the  usual  sense  of  mathematics.  This  terminology,  in  which  a 
“measure”  is  a  “quantification,”  is  widespread  in  finance. 
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could  be  strengthened  by  taking  the  measure  of  risk  to  be  1Z(X)  =  E[X\  +  Xa(X)  for  a 
parameter  value  A  >  0,  where  o’(X)  denotes  standard  deviation.  Then  the  interpretation  of 
1Z(X)  <  b  is  that  outcomes  of  X  above  b  can  only  be  in  the  part  of  the  distribution  of  X 
lying  more  than  A  standard  deviation  units  beyond  the  expectation.  Such  a  “safety  margin” 
approach  is  attractive  for  its  resemblance  to  confidence  levels  in  statistics.  A  third  choice  of 
risk  measure,  aimed  at  enforcing  certainty,  is  to  take  7Z(X)  =  sup  A"  (the  essential  supremum 
of  A",  which  might  be  oo).  Then  1Z(X)  <  b  means  there  is  zero  probability  of  an  outcome  >  b. 

Two  further  examples,  more  directly  in  line  with  our  interests  in  this  article,  are  quantiles, 
7Z(X)  =  Qx(p),  and  superquantiles,  1Z( X)  =  Qx(p ),  as  measures  of  risk.  The  corresponding 
interpretations  of  having  1Z(X)  <  b  are  as  follows: 

Qx(p)  <  b  prob{  X  <  b  }  >  p  prob{  X  >  b  }  <  1  —  p,  (1.8) 

Qx(p)  A  b  even  in  its  p- tail  distribution,  X  is  <6  on  average.  (1-9) 

Probabilistic  constraints  as  in  (1.8)  have  a  wide  following.  In  contrast,  the  condition  in  (1.9) 
might  appear  arbitrary  and  hard  to  work  with.  But  it  has  serious  motivation  in  the  theory 
of  risk,  plus  the  virtue  of  taking  into  account  some  degree  of  effects  in  the  upper  tail  of  the 
distribution  of  X  beyond  the  threshold  b. 6 

A  feature  of  risk  theory  that  elevates  superquantiles  above  quantiles  is  found  in  the  notion 
of  coherency  proposed  by  Artzner  et  al.  [2],  originally  for  purposes  of  determining  appropriate 
cash  reserves  in  the  banking  industry.  Coherency  of  a  risk  measure  1Z  entails  having 

71(C)  =  C  for  constant  random  variables  A"  =  C , 

7Z(X)  <  1Z(X')  when  X  <  X'  almost  surely,  ,  . 

n(x  +  x')  <  n(x)  +  n(x'),  ^A"LU' 

TZ(XX)  =  XX  for  A  >  0. 

Along  with  the  surface  meaning  of  these  axioms,7  there  are  crucial  implications  for  preserving 
convexity  when  measures  of  risk  are  employed  in  optimization.  This  is  explained  from  several 
angles  in  [24], 

For  the  examples  above,  coherency  holds  for  the  extreme  choices  7l(X)  =  E[ X]  and 
Tl(X)  =  sup  A",  but  it  is  absent  in  general  for  Tl(X)  =  E[ X]  +  Acr(X)  with  A  >  0  (because 
the  monotonicity  axiom  fails)  and  for  7l(X)  =  Qx(p)  (because  the  subadditivity  axiom  fails). 
However,  coherency  does  hold  for  7Z(X)  =  Qx(p).  Moreover  it  holds  for  weighted  sums  like 
TZ(X)  =  A kQx(Pk)  with  Afc  >  0  and  X) fell  A k  =  1,  and  even  for  “continuous”  versions  of 
those  sums,  7Z(X)  =  Qx(p)  dX(p )  for  a  probability  measure  A  on  (0,  l).8  In  fact,  a  functional 

6In  reliability  terms,  with  outcomes  X  >  b  signaling  “failure,”  1  —  p  is  the  probability  of  failure  in  (1.8)  and 
the  buffered  probability  of  failure  in  (1.9),  cf.  [20]. 

7The  subadditivity  inequality,  for  instance  says,  in  the  context  of  cash  reserves  in  finance,  that  if  the  cash 
amount  1Z(X)  is  adequate  for  covering  the  risks  in  a  portfolio  with  losses  described  by  the  random  variable  A, 
and  IZ(X')  is  enough  for  a  separate  portfolio  with  losses  described  by  X',  then  the  sum  of  these  amounts  should 
cover  the  combined  portfolio.  This  supports  the  idea  of  diversification  of  assets.  See  Follmer  and  Schied  [7]  for 
more  about  the  role  of  coherency  in  finance. 

8  Such  expressions  relate  strongly  to  “dual  utility  theory,”  the  foundations  of  which  have  recently  been 
strengthened  by  Dentcheva  and  Ruszczynski  [6]. 


1Z(X)  expressible  as  the  supremum  of  a  collection  of  such  superquantile  integrals  is  known  to 
be  the  most  general  kind  of  coherent  measure  of  risk  that  depends  only  on  Fx  and  possesses  a 
certain  continuity  property;  see  [10]  and  [7,  Section  2.6]. 

This  makes  clear  that  superquantiles  are  basic  to  the  foundations  of  risk  theory  and  further 
explains  why  we  are  intent  here  on  positioning  them  prominently  in  view. 

Although  the  defining  formulas  for  superquantiles  might  raise  a  perception  of  them  being 
troublingly  complicated  or  even  intractable  in  comparison  to  quantiles,  quite  the  opposite  is 
true.  A  double  formula  due  to  Rockafellar  and  Uryasev  [22,  23]  brings  them  together  in  a  way 
that  supports  practical  methods  of  computation  while  bypassing  technical  issues  in  the  defining 
formulas  (1.5)  and  (1.6): 


Qx(P )  =  min.x{  x  +  VP(X  -  x)  },  where  VP{X)  =  ^^[maxjO,  A"}]. 

Qx(p )  =  argrninx {  x  +  VP(A"  —  x)  }  (left  endpoint,  if  this  is  not  a  singleton), 


(1.12) 


The  “argmin,”  consisting  of  the  x  values  for  which  the  minimum  is  attained,  is,  in  this  formula, 
a  nonempty,  closed,  bounded  interval  which  typically  reduces  to  a  single  x.  The  functional  Vp 
satisfies 

VP(X)  <  Vp{Xr)  when  X  <  X'  almost  surely, 

Vp(X  +  X')  <  VP(X)  +  V(X'), 

Vp(XX)  =  XVp(X)  for  A  >  0, 

Vp( X)  >  E[X\,  with  equality  holding  only  when  X  =  0. 

Such  properties  are  associated  with  regular  “measures  of  regret”  (rather  than  risk)  in  the 
terminology  of  [24],  and  it  is  appropriate  therefore,  in  view  of  (1.11)  to  refer  to  Vp  as  quantile 
regret.  The  functional  £P(X)  =  VP(X)  —  E[X\  paired  with  Vp  by  [24]  as  its  associated  “measure 
of  error”  is  normalized  Koenker-Basset  error.  It  underlies  quantile  regression  as  a  statistical 
methodology  offering  an  alternative  to  least-squares  regression  [8],  [9]. 

In  models  of  stochastic  optimization  that  incorporate  superquantiles  in  constraints  or  objec¬ 
tives,  the  superquantile  formula  in  (1.11)  can  be  substituted  in  each  instance  with  an  associated 
auxiliary  variable  in  the  overall  minimization.  This  greatly  simplifies  computations  and  simul¬ 
taneously  yields  values  for  the  corresponding  quantiles  in  the  solution;  cf.  [22,  23].  No  such 
computational  help  is  available  for  constraints  and  objectives  expressed  in  quantiles  instead 
of  superquantiles.  As  the  formulas  in  (1.11)  underscore  for  anyone  familiar  with  the  relative 
behavior  of  “min”  and  “argmin”  in  numerical  optimization,  quantiles  are  inherently  less  stable 
than  superquantiles  in  circumstances  where  random  variables  depend  on  decision  parameters. 

A  byproduct  of  the  connections  explored  here  between  distributions,  monotone  relations, 
and  the  convex  functions  associated  with  them  sub  differentially,  will  be  an  explanation — for  the 
first  time — of  how  (1.11)  was  discovered  in  the  background  of  [22,  23].  It  came  from  recognition 
of  the  consequences  of  applying  the  Legendre-Fenchel  transformation  to  those  convex  functions. 

A  further  goal  of  this  article  is  to  develop,  in  Section  4,  a  formula  along  the  lines  of  (1.11) 
in  which  the  argmin  gives  the  superquantile  instead  of  the  quantile: 

Qx('P)  =  argminx {  x  +  VP(X  -  x)  } 

for  the  right  choice  of  a  “regret”  functional  Vp.  Such  a  formula  is  needed  to  for  the  purpose 
of  generalizing  “quantile  regression”  to  “superquantile  regression”  in  the  framework  of  [25]  and 
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[24],  Specifically,  from  VP(X)  as  “superquantile  regret,”  the  functional  SP(X)  =  VP(X)  —  E[X\ 
will  be  the  right  substitute  for  Koenker-Basset  error  in  that  generalization.  Expressions  for 
superquantile  regret  and  superquantile  error  that  serve  in  this  manner  have  not  previously 
been  identified. 


2  Monotone  Relations  in  Convex  Analysis 

In  this  section  we  review  facts  about  monotone  relations  and  the  convex  functions  associated 
with  them  in  order  to  lay  a  foundation  for  analyzing  the  connections  indicated  in  Section  1. 
In  that  analysis,  carried  out  in  Section  3,  the  x  variable  will  have  a  quantile  role  and  the  p 
variable  will  be  associated  with  probability,  but  for  now  both  are  abstract  variables  with  roles 
completely  interchangeable. 

Definition  (monotonicity  and  maximal  monotonicity).  A  set  T  of  pairs  ( x,p )  G  JR  x  JR  is  said 
to  give  a  monotone  relation  if 

(xi  -  x2)(pi  -  p2)  >  0  for  all  (x1,p1)  and  (x2,p2)  in  T,  (2.1) 

so  that  either  ( Xi,pi )  <  (x2 ,p2)  or  (x±,pi)  >  (x2,p2)  in  the  usual  coordinatewise  ordering  of 
vectors  in  JR  x  JR.  In  other  words,  a  monotone  relation  is  a  subset  of  fR  x  1R  that  is  totally 
ordered  in  that  partial  ordering.  A  monotone  relation  T  is  maximal  if  it  cannot  be  enlarged 
without  destroying  the  total  ordering;  there  is  no  monotone  relation  f  DT  with  T'  7^  T. 

Any  monotone  relation  can  be  extended  to  a  maximal  monotone  relation  (not  necessarily 
in  only  one  way).  Maximal  monotonicity  was  introduced  in  1960  by  Minty  [13]  in  the  study  of 
relations  between  variables  like  current  and  voltage  in  electrical  networks  and  their  analogs  in 
other  kinds  of  networks. 

The  symmetry  in  the  roles  of  the  two  variables  in  monotonicity  has  the  consequence  that  if 
T  is  a  monotone  relation,  then  the  inverse  relation  T_1,  defined  by 

r_1  =  {  (p,x)  |  (x,p)  e  r },  (2.2) 

is  likewise  monotone.  Maximality  passes  over  in  this  manner  as  well. 

A  maximal  monotone  relation  has  the  graphical  appearance  of  an  unbounded  continuous 
curve  that  “trends  from  southwest  to  northeast”  and  may  incorporate  horizontal  and  vertical 
segments.  It  may  even  begin  or  end  with  such  a  segment  of  infinite  length.  As  extreme  cases, 
an  entire  horizontal  line  gives  a  maximal  monotone  relation  and  so  does  an  entire  vertical  line. 
The  union  of  the  nonnegative  x-axis  with  the  nonpositive  p-axis  is  likewise  a  maximal  monotone 
relation,  moreover  one  which  very  commonly  arises  in  applications  (not  tied  to  probability).  It 
is  the  “infinite  gamma”  shape  of  that  relation  that  earlier  suggested  the  notation  T. 

A  noteworthy  feature  of  a  maximal  monotone  relation  is  its  canonical  parameterization  by 
an  auxiliary  variable  t: 

For  a  maximal  monotone  relation  T  and  any  t  G  (— oo,  oo), 

the  line  x  +  p  =  t  intersects  T  in  a  unique  point  (x(t),p(t)),  (2.3) 

and  x(t)  and  p{t)  are  Lipschitz  continuous  as  functions  of  t. 
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Put  in  another  way,  the  graph  of  a  maximal  monotone  relation  is  a  sort  of  manifold  that  is 
globally  “lipeomorphic”  to  the  real  line.  This  striking  property,  so  function- like,  makes  up  for 
the  disadvantage,  to  classical  eyes,  of  allowing  the  graph  to  contain  vertical  segments. 

An  exposition  of  the  theory  of  maximal  monotone  relations  which  covers  (2.3)  and  other 
properties  yet  to  be  mentioned  is  available  in  [26,  Chapter  12],  where  the  subject  is  extended 
beyond  subsets  of  JRx  1R  to  subsets  of  lRn  x Mn.  (In  the  higher  dimensional  setting,  monotonicity 
becomes  a  generalization  of  posit ive-semidehniteness.)  Some  aspects  are  also  in  the  earlier  book 
[17,  Section  24],  A  version  of  the  subject  dedicated  to  extending  the  original  network  ideas  of 
Minty,  and  offering  many  examples,  is  in  [19,  Chapter  8]. 

Some  basic  convexity  properties  are  obvious  from  the  Minty  parameterization.  For  instance, 
the  domain  and  range  of  a  maximal  monotone  relation  T,  namely 

domT  =  {  x  |  (x,p)  G  T  for  some  p},  rgeT  =  {p  \  (x,p)  G  T  for  some  x  },  (2.4) 

are  nonempty  intervals,  although  not  necessarily  closed,  while  the  sets 

r(z)  =  {p\  (x,p)  e  r },  r”1(ri  =  {x|(x,p)er},  (2.5) 

are  closed  intervals  with  T(x)  ^  0  when  x  G  domT,  and  T_1(p)  ^  0  when  p  G  rgeT.  Clearly 
dorn  r- 1  =  rgeT  and  rgeT^1  =  dom  T . 

The  connection  between  maximal  monotone  relations  T  and  nondecreasing  functions  7  on 
(— 00,  00 )  is  elementary  and  closely  reflects  the  special  case  of  distribution  functions  considered 
in  Section  1.  Suppose  7  :  (— 00,  00)  — >■  [— 00,00]  is  nondecreasing  and  not  identically  —00  or 
identically  00.  Then  there  are  left  and  right  limits 

7”  0c)  =  lim  7(0/),  7+(^)  =  hm  7(0/),  (2.6) 

x'  X  x'\tX 

with  7-(x)  <  7(0;)  <  7+(x).  They  define  functions  7“  and  7+  which  are  left-continuous  and 
right-continuous,  respectively.  A  maximal  monotone  relation  T  is  obtained  by  taking 

T  =  {  (x,p)  G  JR  x  1?  |  7_(x)  <  p  <  7+(x)  }•  (2.7) 

The  original  7  has  no  direct  role  in  this  and  could  be  replaced  by  either  7+  or  7'  from  the 

start,  because  (7+)~  =  7“  and  (7+)+  =  7+,  whereas  (7~)_  =  7“  and  (7“)+  =  7+.  Conversely, 

given  a  maximal  monotone  relation  T  one  can  define 

7-(x)  =  min{  p  \  (x,p)  6  T  }  and  7+(a;)  =  rnax{  p  \  (x,p)  G  T  }  for  x  G  domT, 

7_(x)  =  7+(a;)  =  —00  at  points  x  to  the  left  of  domT  (if  any),  (2.8) 

7"(a;)  =  7+(a;)  =  00  at  points  x  to  the  right  of  domT  (if  any), 

to  get  a  pair  of  nondecreasing  functions  7"  and  7+,  one  continuous  from  the  left  and  one 
continuous  from  the  right,  which  produce  T  through  (2.7). 

Subdifferentiation.  The  connection  between  maximal  monotone  relations  and  the  subdif¬ 
ferentiation  of  convex  functions  will  be  explained  next.  A  proper  convex  function  on  (— 00,  00) 
is  a  function  /  :  (— 00,  00)  — »  (— 00,  00]  that  is  not  =  00  and  satisfies 

/(( 1  —  t)x  +  tx')  <  (1  —  r)f(x)  +  rf(x')  for  all  r  G  (0, 1)  and  all  x,x'.  (2.9) 
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In  terms  of  the  effective  domain  and  epigraph  of  /,  defined  by 


dom  f  —  {x  \  f  (x)  <  oo  },  epi/  =  {  (x,v)  \f(x)  <  v  <  oo},  (2.10) 


the  definition  is  equivalent  to  saying  that  the  proper  convex  functions  are  the  functions  /  : 
(— oo,  oo)  — >  (— oo,  oo]  for  which  epi  /  is  a  nonempty  convex  subset  of  M  x  M,  or  equivalent 
on  the  other  hand  to  taking  a  nonempty  interval  /,  a  finite  convex  function  /  on  I,  and  then 
defining  f(x)  =  oo  for  x  ^  then  dom  /  =  I. 

A  proper  convex  function  /  is  said  to  be  closed  when  it  is  lower  semicontinuous,  i.e.,  has  its 
lower  level  sets  {  x  \  f(x)  <  c  }  closed,  for  all  c  G  M.  This  holds  if  and  only  if  epi  /  is  closed  in 
JR  x  JR.  Because  a  finite  convex  function  on  an  open  interval  is  necessarily  continuous,  a  proper 
convex  function  has  to  be  continuous  except  perhaps  at  the  endpoints  of  dom  /.  Closedness 
thus  refers  only  to  behavior  at  those  endpoints,  it  requires  that  f(xk)  f(x)  when  x  is  an 
endpoint  and  a  sequence  of  points  Xk  f1  x  in  dom  /  tends  to  x. 

For  a  proper  convex  function  /  and  any  x  G  dom  /,  the  left-derivative  and  the  right- 
derivatives  of  /,  namely, 


/'  (x)  =  lim 

x'  /  X 


fix')  ~  f(x) 


f  +(x)  =  lim 


f(x')  -  f(x) 


X'  —  X  x'\x  X'  —  X 

exist  with  f'~{x)  <  f,+  (x).  The  “set- valued”  mapping  df  defined  by 


(2. !1) 


at(T\  _  \{p^JR\f'  (a)  <P<f,+(x)}  for  xe  dom/,  (  , 

I[  )  ~  {  0  for  x  ^  dom/, 


is  called  the  subdifferential  of  /.  When  f'~(x)  =  f'+(x),  this  common  value  (if  finite)  is  the 
derivative  f{x).  That  holds  for  all  but  countably  many  points  in  the  interior  of  dom  /  because 
of  the  convexity  of  /:  The  one-sided  derivatives,  as  functions  of  x,  are  nondecreasing  and 
respectively  left-continuous  and  right-continuous,  having  (/,_)+  =  f  +  and  (//+)“  =  f'~. 

The  key  fact  about  subdifferentials  df  in  general,  going  beyond  the  case  of  single- valuedness, 
is  this: 


for  a  proper  convex  function  /  that  is  closed,  the  graph  of  df,  namely 
gph  df  =  {  (x,p)  \p  G  df(x)  }, 

is  a  maximal  monotone  relation  T;  moreover  every  maximal  monotone  (2-13) 

relation  T  is  the  graph  of  df  for  some  closed  proper  convex  function  /, 
and  such  a  function/  is  uniquely  determined  up  to  an  additive  constant. 

The  first  part  of  this  statement  stems  from  the  observation  that  if  the  one-sided  derivatives  in 
(2.12)  are  extended  outside  of  dom  /  by  taking 

f~(x)  =  f'+(x )  =  —  oo  at  points  x  to  the  left  of  dom  /  (if  any),  ,  . 

f~(x)  =  f,+  (x )  =  oo  at  points  x  to  the  right  of  dom  /  (if  any), 

one  gets  as  7^  =  f'~  and  y+  =  f,+  a  left-continuous/right-continuous  pair  on  all  of  (— 00,  00) 
for  which  the  T  associated  by  (2.7)  is  gph  df.  The  second  part  is  established  by  taking  for 
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a  given  maximal  monotone  relation  T  the  corresponding  left-continuous/right-continuous  pair 
7“,  7+,  and  for  any  7  between  them  and  any  Xq  G  domT  defining 


rx 

f(x)=  /  7 (t)dt  +  c  for  an  arbitrary  constant  c. 
Jx  0 


(2.15) 


It  turns  out  then  that  /  is  a  closed  proper  convex  function  having  7  —  f  and  7+  =  f'+- 

Dualization  through  the  Legendre-Fenchel  transform.  Duality  is  the  topic  reviewed 
next.  Its  central  feature  is  a  one-to-one  correspondence  among  closed  proper  convex  functions 
which  unites  them  in  pairs.  Here  we  only  look  at  it  in  the  one-dimensional  setting,  but  in 
multi-dimensional  and  even  infinite-dimensional  convex  analysis  it  is  the  repository  of  virtually 
every  duality  property  that  is  known;  see  [17,  18,  26]. 


For  any  closed  proper  convex  function  /  on  (— 00,  00)  the  formula 
f*(p)  =  sup X{xp-  f(x)} 

defines  a  closed  proper  convex  function  f*  on  (— 00,  00 )  such  that 
f(x)  =  supp{ip  -  f(p)  }  =  (f’)’(x). 


(2.16) 


The  passage  from  /  to  f*  is  the  Legendre-Fenchel  transform.  Accompanying  these  formulas  are 
the  subdifferential  rules  that 


df*(p)  =  argmaxx  {  xp  —  f(x)  },  df(x)  =  argmaxp  {px  —  f*(p )  }. 

A  major  consequence  for  purposes  here  is  that 

For  any  conjugate  pair  of  closed  proper  convex  functions,  /  and  /*, 
one  has  df*  =  (d /)_1  meaning  that  x  G  d f*(p)  p  G  d f(x). 


(2.17) 


(2.18) 


Thus,  the  maximal  monotone  relations  associated  with  f  and  f*  are  inverse  to  each  other.  Note 
that  as  special  cases  of  (2.17)  and  (2.18)  one  has 


inf /  =  argmin /  =  df*( 0),  inf f*  =  -/( 0),  argrnin f*  =  df{ 0). 


(2.19) 


Set  convergence  and  its  variants.  Finally,  notions  of  convergence  that  are  natural  to 
convex  analysis  need  to  be  explained,  particularly  because  they  hardly  enter  the  standard  frame 
of  analysis  (although  they  should).  We  keep  to  the  context  of  1R2  because  that  is  all  we  require, 
but  a  full  theory  in  finite  dimensions  is  provided  in  [26,  Chapter  4], 

For  a  nonempty  closed  subset  S  of  l?2,  the  associated  distance  function  is 

ds(u )  =  min  ||w  —  w\\  for  the  Euclidean  norm  ||  •  ||. 
wGS 

(Any  norm  would  do  equally  well.)  This  function  ds  is  nonnegative  with  S  —  {  u  \  ds(u)  =  0  } 
and  it  is  Lipschitz  continuous  with  Lipschitz  constant  1.  We  are  concerned  with  a  sequence  of 
nonempty  closed  subsets  Sk  in  JR2  and  the  issue  of  whether  Sk  “converges”  to  S  as  k  — *  00, 
with  set- convergence  in  the  Kuratowski/Painleve  sense  intended.  Although  there  are  numerous 
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characterizations  of  such  convergence  (cf.  [26,  Chapter  4]),  it  suffices  here  to  concentrate  on  a 
description  that  is  easy  to  visualize: 


lim  Sk  =  S  7=7-  lim  dsk(u )  =  ds(u)  for  every  u. 

k—>  oo  k—>  oo 

Because  of  the  Lipschitz  continuity,  such  pointwise  convergence  entails  uniform  convergence  of 
the  distance  functions  on  all  bounded  sets. 

Two  convergence  notions  more  closely  tuned  to  our  present  discussion  are  built  around  such 
set  convergence.  First, 


graphical  convergence  to  Y  of  a  sequence  of  maximal  monotone 
relations  r k  refers  to  their  convergence  as  subsets  of  M  x  JR. 


(2.20) 


Second, 


epigraphical  convergence  to  /  of  a  sequence  of  closed  proper  convex  functions 
fk  on  (— oo,  oo)  refers  to  the  set-convergence  of  epi  fk  to  epi  /  in  JR  x  JR. 

Two  celebrated  results  about  these  notions  underscore  their  fundamental  significance. 

A  sequence  of  closed  proper  convex  functions  fk  epi-converges  to  /  if  and 
only  if  the  maximal  monotone  relations  Yk  —  gph  dfk  converge  graphically 
to  T  =  gph  df,  while  fkfxk)  ~ >  fix)  for  some  sequence  >  x  G  dom /. 

On  the  other  hand, 

A  sequence  of  closed  proper  convex  functions  fk  epi-converges  to  such  an  / 
if  and  only  if  their  conjugate  functions  fk  epi-converge  to  the  conjugate  f*. 


(2.21) 


(2.22) 


(2.23) 


In  other  words,  the  Legendre- Fenchel  transform  is  continuous  with  respect  to  epi-convergence. 

In  general,  it  is  possible  for  a  sequence  of  functions  to  epi-converge  without  converging 
pointwise  everywhere,  and  conversely.  However,  in  the  applications  we  will  make  involving 
random  variables  some  degree  of  pointwise  convergence  can  be  utilized.  This  comes  from  the 
following  characterization. 


For  closed  proper  convex  functions  fk  and  /  on  (— oo,  oo)  with  the 
same  nonempty  open  interval  I  as  interior  of  dom  /  and  dom  fk ,  the 
epi-convergence  of  fk  to  /  is  equivalent  to  the  pointwise  convergence 
of  fk  to  /  on  the  interval  /,  or  for  that  matter  on  a  dense  subset  of  /, 
in  which  case  the  convergence  is  uniform  on  all  compact  subsets  of  I. 


(2.24) 


Graphical  convergence  of  maximal  monotone  relations  can  likewise  be  furnished  with  char¬ 
acterizations  based  on  pointwise  convergence: 


for  maximal  monotone  T k  and  Y  associated  with  nondecreasing 
functions  y*,  and  7,  graphical  convergence  of  T*,  to  Y  corresponds 
to  pointwise  convergence  of  7 k  to  7  at  all  continuity  points  of  7,  or 
equivalently,  to  pointwise  convergence  of  7 k  to  7  almost  everywhere. 


(2.25) 
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On  an  open  interval  /  where  the  functions  7*,  and  7  are  finite,  such  pointwise  convergence 
almost  everywhere  is  furthermore  equivalent  to  having 


|7 k(x)  -  7(x)| dx  ->■  0 


for  all  [a,  b]  C  /, 


(2.26) 


as  seen  through  application  of  Lebesgue  dominated  convergence  in  the  context  of  these  functions 
being  nondecreasing. 

Second  derivatives.  Convex  functions  are  known  generally  to  be  twice  differentiable 
almost  everywhere.  Where  does  this  enter  our  picture?  Monotone  relations  provide  a  helpful 
graphical  view. 

For  a  closed  proper  convex  function  /  the  twice  differentiability  of  /  at  x  E  dom  /  means 
that  the  one-sided  derivative  functions  f'~  and  f'+  agree  at  x  and  are  differentiable  there. 
Graphically  in  terms  of  the  monotone  relation  T  giving  d /,  an  equivalent  statement  is  that 
there  is  a  unique  p  such  that  (x,p)  E  T,  and  furthermore  a  nonvertical  tangent  line  to  T  at 
(. x,p ).9  Here  f'(x)  =  p  and  the  slope  of  the  tangent  is  f"(x).  This  slope  must  of  course  be 
nonnegative. 

Let  us  call  (x,  p)  a  nonsingular  point  of  T  if  there  is  a  tangent  line  there  which  is  nonvertical 
and  also  nonhorizontal.  This  corresponds  to  /  having  a  nonzero  second  derivative  at  x.  The 
symmetry  in  this  notion  provides  us  then  with  the  following  equivalences: 


/  has  f(x)  =  p  and  second  derivative  f"(x)  >  0 
•<=>■  (x,p)  is  a  nonsingular  point  of  T 

(p,x)  is  a  nonsingular  point  of  A  =  T”1,  (2.27) 

/*  has  f*'(p )  =  x  and  second  derivative  f*"(p)  >  0, 
in  which  case  the  second  derivatives  are  reciprocal,  f*"(p )  =  1  / f"(x). 


Beyond  passing  to  second  derivatives  in  this  manner,  one  can  think  of  the  maximal  monotone 
relation  T  as  directly  associated  with  a  measure  “dT”  defined  in  Lebesgue-Stieltjes  manner 
through  the  nondecreasing  functions  associated  with  it.  (Vertical  segments  in  T  correspond  to 
atoms  in  this  measure,  and  the  continuous  nondecreasing  function  obtained  by  shrinking  them 
out  gives  the  rest  of  the  measure  in  the  usual  way.)  Likewise,  the  inverse  relation  A  yields  a 
measure  “dA.  These  measures  are  reciprocal  in  a  certain  sense  that  encompasses  (2.27).  They 
can  be  construed  as  the  generalized  second  derivatives  of  /  and  f*. 


3  Back  to  Random  Variables 

We  turn  now  to  applying  the  general  results  in  Section  2  to  random  variables  in  the  setting  of 
Section  1.  We  start  with  monotonicity  and  go  on  to  duality.  Then  we  see  where  this  leads  us 
with  convergence  issues.  Supporting  facts  about  expectations  need  to  be  recorded  beforehand. 
To  avoid  complications  that  are  inessential  for  our  purposes,  we  make  the  assumption  that 

henceforth  all  random  variables  X  have  E[  |A|  ]  <  00.  (3-1) 

9For  tangency  in  general  terms,  see  [26,  Chapter  6]. 
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Then  E[X\  is  well  defined  and  finite,  in  particular. 

Expectations  with  respect  to  the  probability  measure  on  (— oo,  oo)  induced  by  a  random 
variable  X  take  the  form  of  Lebesgue-Stieltjes  integrals  with  respect  to  Fx-  One  has 

/OO 

g(x)dFx(x)  (3.2) 

-oo 

for  any  (measurable)  function  g  such  that  the  integrand  g(x)  is  integrable  with  respect  to  the 
probability  measure  in  question,  or  at  least  is  bounded  from  below  by  something  integrable; 
cf.  Billingsley  [3,  Section  21], 10  An  expression  of  the  same  expectation  in  terms  of  the  quantile 
function  Qx  instead  of  the  distribution  function  Fx  is 

E\g(X)\=  f1  g(Qx(p))dp,  (3.3) 

Jo 

again  as  long  as  g(Qx(p))  is  bounded  from  below  by  something  integrable.  This  holds  because 
the  integrals  on  the  right  of  (3.2)  and  (3.3)  agree  through  a  change-of- variable  rule;  cf.  Billingsley 
[3,  Theorem  16. 13]. 11  In  particular, 

Elx }  =  f-oo  x  dFx(x )  =  Jo1  Qxijp)  dp  (finite),  (  , 

E[ \X\r]  =  fZo  \x\rdFx(x)  =  fo1  \Qx(p)\rdp  for  r  >  1.  1  ‘  J 

Also  conforming  to  this  rule  is  the  equivalence  of  the  alternative  definitions  in  (1.5)  and 
(1.6)  of  the  superquantile  Qx(p)  for  p  G  (0,1).  This  equivalence  can  be  identified  with  the 
version  of  the  first  equation  in  (3.4)  that  results  from  replacing  Fx  by  the  p-tail  distribution 
function  F^  described  right  after  (1.5)  and  replacing  Qx  accordingly  by  the  quantile  function 
Q'x  for  F$\  with  Qx(t)  =  Qx(p  +  t(  1  -  p))  for  t  G  (0, 1).  Since  Q[x(t)  >  Qx(p)  >  -oo,  the 
integrand  in  the  quantile  integral  is  bounded  from  below  by  an  integrable  function  on  (0,1), 
and  the  equivalence  between  (1.5)  and  (1.6)  is  thereby  justified. 

Maximal  monotonicity  from  distributions  and  quantiles.  The  distribution  function 
Fx,  which  is  nondecreasing  right-continuous,  has  a  left-continuous  counterpart  Fx-  The  mono¬ 
tonicity  construction  in  Section  2,  when  applied  to  this  pair,  yields  the  relation  Ty  described 
in  Section  1  in  terms  of  “filling  in  the  vertical  gaps” : 

fx  =  {  (x,p)  G  R  x  R  |  Fx(x)  <p<  Fx(x)  }.  (3.5) 

Hence  Tx  is  a  maximal  monotone  relation.  One  can  proceed  similarly  with  the  nondecreasing 
left-continuous  function  Qx  by  extending  it  in  the  natural  way  beyond  (0, 1)  with 

Qx(  1)  =  lim  Qx(p),  Qx{p )  =  oo  for  p  >  1,  Qx(p)  =  -oo  for  p  <  0,  (3.6) 

p/i 

10Recall  that  the  integral  of  a  nonnegative  (measurable)  function  is  always  well  defined  but  might  be  oo.  A 
function  is  “integrable”  if  its  absolute  value  has  finite  integral.  The  integral  of  any  function  that  is  bounded 
below  by  an  integrable  function  is  likewise  well  defined  as  a  finite  value  or  oo. 

11The  (lFx  measure  on  (—00,00)  is  the  one  induced  from  the  dp  measure  on  (0, 1)  by  the  function  QX- 
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so  as  to  get  a  nondecreasing  left-continuous  function  on  (—00,  00).  Its  extended  right-continuous 
counterpart  Qx  has  Qx(0)  =  limp\0(3x(p)  and  Qx(  1)  =  00 ■  The  relation  Ay  described  in 
Section  1  is  then 

Ax  =  {(p,x)  e  IRx  ]R  \  Qx  ijp)  <x<  Qx(p)  },  (3.7) 

and  it,  too,  is  therefore  a  maximal  monotone  relation.  Moreover  these  relations  are  inverse  to 
each  other  through  the  reciprocal  formulas  (1.2)  and  (1.3)  for  passing  between  Fx  and  Qx'- 

{x1p)eYx  (p,x)  e  Ax,  i.e.,  A.y  =  Ty1  and  rx  =  A^1.  (3.8) 

This  recalls  the  setting  in  (2.18)  in  which  a  pair  of  maximal  monotone  relations  that  are 
the  inverses  of  each  other  are  the  graphs  of  the  subdifferentials  of  two  convex  functions  that 
are  conjugate  to  each  other.  The  construction  of  such  functions  are  where  we  are  now  headed. 

Superexpectation  functions.  A  basic  choice  confronts  us  right  away.  We  can  pass  from 
Tx  to  a  convex  function  /  having  it  as  the  graph  of  d /,  but  an  additive  constant  is  thereby  left 
undetermined.  An  idea  coming  straight  to  mind  is  to  look  at  f(x)  =  f0e  Fx{x')dx\  but  that 
has  a  big  disadvantage  for  applications,  as  will  be  explained  below.  Another  choice  with  a  lot 
behind  it  is  taking  /  to  be  the  function12 

Fx\x)  —  -E[max{0,  x  —  A}]  =  j  Fx(x)dx,  (3.9) 

which  is  finite13  and  convex  with  right-derivative  Fx-  Ogryczak  and  Rusczynski  showed  in  [16, 
Theorem  3.1]  that  the  conjugate  of  Fx^  is  the  convex  function  given  on  [0, 1]  by14 

Fx2\p)=  [ PQx(p'W •  (3.10) 

Jo 

/o\ 

but  equalling  00  outside  of  [0, 1].  It  has  Qx  as  its  left  derivative  on  (0, 1).  In  statistics,  Fx  1 
/ _ 2) 

and  Fx  have  long  standing,  but  they  emphasize  the  lower  tail  of  X  instead  of  the  upper  tail. 

Desiring  something  tuned  instead  to  upper  tail  properties,  Dentcheva  and  Martinez  in  [4] 
introduced  in  parallel  to  (3.9)  the  “excess  function” 

poo 

Hxix)  —  E[  max{0,  A  —  x}]  =  /  [1  —  Fx{x)\dx,  (3.11) 

J  X 

which  likewise  is  finite  and  convex.  They  showed  that  its  conjugate  Hx  can  be  expressed  on 
[0, 1]  in  terms  of  (although  not  directly  as)  the  function15 

Lx (p)  =  f  Qx{p')dp'-  (3.12) 

Jp 

However,  Lx  is  concave,  not  convex,  and  it  has  —Qx(p)  as  its  left-derivative  at  p,  while  H\ 
has  Fx(x)  —  1  as  its  right-derivative  at  x.  Thus,  this  adaptation  to  a  “cost”  orientation  of  X 
does  not  sit  comfortably  in  our  duality  framework. 

12This  function  is  traditionally  important  in  “stochastic  dominance,”  to  be  taken  up  in  Section  4. 

13The  finiteness  of  the  integral  is  assured  under  (3.1). 

14This  gives  the  “Lorenz  curve”  [11]  associated  with  X. 

15This  is  the  “upper  Lorenz  function”  in  their  terminology,  although  the  format  of  notation  is  ours. 
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A  different  choice  will  therefore  be  made  here  instead.  It  is  dictated  in  part  by  our  interest 
in  coordinating  with  the  “superquantiles”  of  random  variables  described  in  Section  1,  as  will 
be  apparent  when  we  come  to  duality. 

We  define  the  superexpectation  function  Ex  associated  with  a  random  variable  X  by 

/oo  r  1 

ma x{x,x'}  dFx(x')  =  /  max{i,  Qx(p)}  dp,  (3.13) 

-oo  J  0 

with  the  value  Ex(x)  being  termed  the  superexpectation  of  X  at  level  a:.16 

Theorem  1  (characterization  of  superexpectations).  The  superexpectation  function  Ex  for 
a  random  variable  X  having  i7[|A|]  <  oo  is  a  finite  convex  function  on  (— oo,  oo )  which 
corresponds  sub  differentially  to  the  monotone  relation  Tx  and  the  distribution  function  Fx 
through 

rx  =  gph  dEx,  Fx(x)  —  Ex(x).  (3.14) 

It  is  nondecreasing  with 

Ex(x )  —  x  >  0,  lim  [Ex(x)  —  x]  —  0,  lim  Ex(x )  =  E[X]  (3.15) 

X  ^OO  X  ^  OO 

and  has  the  additional  convexity  property  that 

Ex(x)  <  (1  —  A )EXo{x)  +  \EXl(x)  when  X  =  (1  —  A)A0  +  XX\  with  0  <  A  <  1.  (3.16) 

On  the  other  hand,  any  convex  function  f  on  (— oo,  oo )  with  the  properties  that 

f(x)  —  x  >  0,  lim  [f(x)  —  x]  —  0,  lim  f(x)  =  a  finite  value,  (3.17) 

X  OO  X  \i  — oo 


is  Ex  for  a  random  variable  X  having  E[  |A|  ]  <  oo. 


Proof.  The  asymptotics  in  (3.15)  are  evident  from  max{i,  i;}  —  x  =  max{0,  x'  —  a;}  >  0, 
where  the  expressions  as  functions  of  x'  decrease  pointwise  to  0  as  x  tends  to  oo  but  increase 
pointwise  to  x'  as  x  tends  to  — oo.  To  connect  with  Fx  giving  the  right  derivative,  observe  for 
x'  >  x  that 


max{V,  t}  —  max{x,  t} 

x'  —  x 


1 

0 


(o.i) 


if  t  <  x, 
if  t  >  x', 
if  f  6  ( x ,  x'), 


and  therefore 


prob  {A  <  x}  < 


Ex{x')  -  Ex(x) 

x'  —  x 


<  prob  {A"  <  x'}, 


where  the  left  side  equals  Fx(x)  and  the  right  side  equals  Fx(x').  In  taking  the  limit  on  both 
sides  as  x'\x  and  utilizing  the  right-continuity  of  Fx,  we  confirm  that  Ex(x)  =  Fx(x). 

The  additional  property  in  (3.16)  is  a  consequence  of  the  convexity  of  max{x,A}  with 
respect  to  A"  in  the  definition  (3.13)  of  Ex(x). 

If  a  convex  function  /  has  the  properties  in  (3.17),  it  must  be  finite  on  (— oo,  oo)  and 
nondecreasing.  Moreover  its  left-derivatives  f~{x)  and  right- derivatives  f'~(x )  must  lie  in  [0, 1] 
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Relative  to  the  excess  function  Hx  in  (3.11),  we  have  clearly  have  Ex{x )  =  Hx{x)  +  x. 
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and  increase  to  1  as  i  tends  to  oo  but  decrease  to  0  as  i  tends  to  —  oo.  Thus  in  particular, 
the  right-continuous  function  f'+  meets  the  requirements  of  a  distribution  function  Fx  for  a 
random  variable  X.  □ 

The  properties  in  (3.17)  say  that  the  graph  of  Ex  is  above,  but  asymptotic  to,  the  45-degree 
line  y  =  x.  The  additional  convexity  property  in  (3.16)  is  valuable  for  applications  in  stochastic 
optimization,  which  often  involve  random  variables  X ( u )  that  depend  linearly  or  convexly  on  a 
parameter  vector  u.  It  is  also  a  signal  of  the  aptness  of  Ex  as  our  designated  choice  of  a  convex 
function  /  having  Fx  as  its  right-derivative.  This  property  also  holds  for  Fx^  as  an  antecedent 
of  Fx,  but  that  choice  concentrates  on  the  lower  tail  instead  of  the  upper  tail.  It  is  absent  for 
other  seemingly  natural  choices,  such  as  f(x)  =  f*  Fx(x')dx' .  In  that  case, 


Fx(x')dx' 


Ex(x)  —  Ex(a)  for  any  a  G  (— oo,  oo), 


since  both  sides  the  same  right-derivatives  in  x  and  both  vanish  at  a.  Although  Ex(a),  like 
Ex(x),  is  convex  with  respect  to  X,  the  difference  Ex(x)  —  Ex(a)  that  property. 

Conjugate  superexpectations.  Dualization  of  the  superexpectation  function  Ex  through 
the  Legendre- Fenchel  transform  will  be  addressed  next.  This  is  where  the  super  quantiles  Qx(p) 
of  (1.5)  and  (1.6)  come  on  stage. 

The  conjugacy  claim  in  the  following  theorem  is  new  only  in  its  formulation,  in  view  of  the 
conjugacy  between  (3.9)  and  (3.10)  already  established  by  Ogrychak  and  Ruszczynski  in  [16], 
and  the  result  of  Dentcheva  and  Martinez  in  [4]  about  the  relationship  between  the  functions 
in  (3.11)  and  (3.12).  However,  the  proof  we  supply  takes  a  different  route. 

Theorem  2  (dualization  of  superexpectations).  The  closed  proper  convex  function  E\  on 
(— oo,  oo)  that  is  conjugate  to  the  superexpectation  function  Ex  on  (— oo,  oo)  is  given  by 


EVP)  =  >  ~E[X] 


K1  T  P)Qx(p)  for  pe  {0,1), 

for  p  =  0, 

0  for  p  =  1, 

oo  for  p  ^  [0, 1]. 


It  is  continuous  relative  to  [0, 1],  entailing 


lirn  (1  -  p)  Qx{p)  =  0, 

v/  1 


lip  Qx(p)  =  E\X], 

p\  o 


(3.18) 


(3.19) 


and  it  corresponds  subdifferentially  to  the  maximal  monotone  relation  Ax  =  Txx  and  the 
quantile  function  Qx  through 


A.Y  =  gph  dE*x,  QX(P)  —  Ex~{p).  (3.20) 

On  the  other  hand,  any  function  g  on  (— oo,  oo)  that  is  finite  convex  and  continuous  on  [0, 1] 
with  g(  1)  =  0,  but  g{p)  =  oo  for  p  ^  [0, 1],  is  E*x  for  some  random  variable  X. 

Proof.  Let  g  denote  the  function  of  p  e  (— oo,  oo)  described  by  the  right  side  of  (3.18).  It  will 
be  demonstrated  in  steps  that  g  is  a  closed  proper  convex  function  having  Ex  as  its  conjugate 
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g*.  That  will  tell  us  through  (2.16)  that  g  is  in  turn  the  conjugate  E\.  It  will  also  confirm 
the  limits  in  (3.19),  since  a  closed  proper  convex  function  on  (— oo,  oo)  is  always  continuous 
relative  to  an  interval  on  which  it  is  finite,  cf.  [17,  Corollary  7.5.1]. 

From  the  expression  for  Qx(p )  in  (1-6),  already  justified  as  being  equivalent  to  the  one 
in  (1.5),  we  have  g{p )  =  —  fp  Qx(p')dp'  for  p  E  (0,1).  This  implies  that  g'~{p)  =  Qx(p )  and 
g'+(p)  =  Q+X{p)  on  (0, 1).  Since  the  limit  of  —  Qx{p')dp'  as  pE  1  is  0,  while  the  limit  as  p\0  is 
—  Jq  Qx{p')dp'  =  —  E[X\  by  (3.4),  g  is  continuous  relative  to  [0, 1].  Since  Qx  is  nondecreasing, 
g  is  also  convex  on  [0, 1]  and  hence,  in  its  extension  outside  of  [0, 1]  by  oo,  is  a  closed  proper 
convex  function  on  (— oo,oo).  Furthermore,  the  left-  and  right- derivative  functions  for  g,  as 
extended  in  the  manner  of  (2.14),  are  the  functions  Qx  and  Qx  as  extended  in  (3.6).  The 
graph  of  dg,  as  determined  by  definition  from  g'~  and  g'+ ,  is  therefore  the  relation  Ax  in  (3.7). 

It  follows  then  from  (2.18)  that  the  convex  function  g*  conjugate  to  g  has  the  relation 
T  =  A^1  as  the  graph  of  dg*.  Since  Ex  is  already  known  from  Theorem  1  to  have  Y  as  the 
graph  of  8Ex,  the  functions  g*  and  Ex  can  differ  at  most  by  a  constant,  Ex  =  g*  +  c.  On 
taking  conjugates  again,  we  get  E*x  =  (g*  +  c)*  =  ( g *)*  —  c  =  g  —  c.  Thus,  to  verify  that  c  =  0, 
confirming  that  E*x  =  g,  it  will  suffice  to  show  that  Ex{  1)  =  0.  For  this  we  apply  the  formula 
for  the  Legendre-Fenchel  transform:  Ex(p)  =  sup,,,  {px  —  Ex(x)  }  at  p  —  1.  This  gives  us 

—Ex(  1)  =  infx{  —  x  +  F[max{j,  A}]  }  =  infx{  F’fmaxjO,  X  —  x}}  }, 

where  the  expectation  of  max{0,  X  —  x}  is  always  >  0  but  approaches  0  as  x  — »  oo. 

For  the  last  part  of  the  theorem,  we  note  that  for  any  g  as  described  there  the  function 
q  =  g'~  on  (0, 1)  is  left-continuous  and  nondecreasing,  with  g(p)  =  —  f ^  q(p')dp' .  In  other  words, 
q  meets  the  requirement  of  being  a  quantile  function  Qx  for  which  the  right  side  of  (3.18)  can 
be  identified  with  g.  Then  g  must  be  the  corresponding  function  E*x.  □ 

Corollary  (superquantile  functions).  The  conjugate  E*x  is  uniquely  determined  by  the  super¬ 
quantile  function  Qx.  Not  only  it,  but  also  Ex,  Fx  and  Qx,  along  with  Yx  and  Ax,  can 
be  reconstructed  from  knowledge  of  Qx.  Moreover  the  following  properties  of  a  function  g  on 
(0, 1)  are  necessary  and  sufficient  to  have  g  =  Qx  for  a  random  variable  X  with  E[  |X|  ]  <  oo: 

(1  —  p)g(p)  is  concave  in  p  with  lim  (1  —  p)g(p)  =  0,  lim  g(p)  =  a  finite  value.  (3.21) 

p/l  p\ o 

Proof.  Once  Qx  has  determined  E\  from  (3.18),  we  get  Ex  as  the  conjugate  (Ex)*.  These 
functions  yield  Fx  and  Qx  through  one-sided  differentiation,  and  we  then  have  the  monotone 
relations  Y x  and  Ax  as  well.  The  conditions  listed  for  a  function  g  correspond  to  the  conditions 
on  g  at  the  end  of  Theorem  2.  □ 

Besides  this  characterization,  it  is  interesting  to  observe  as  a  consequence  of  the  formula 
(1.6)  for  super  quantiles  Qx(x )  that 

Qx  is  a  continuous  increasing  function  of  p  G  (0, 1)  with 

q'x(p)  =  ATLNrM  <  QMjzQjM  =  q'*(p).  <3-22) 

1  —  p  1  —  p 

In  contrast,  Qx  is  only  nondecreasing,  not  (strictly)  increasing,  and  can  be  discontinuous. 
There  is  no  assurance  that  Qx  has  left-derivatives  or  right-derivatives  apart  from  the  general 
dictum  that  a  nondecreasing  function  is  differentiable  almost  everywhere. 
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Example  (exponential  distributions).  Let  X  be  exponentially  distributed  with  parameter  A  > 
0.  Then  the  distribution  function  is  Fx(x)  =  1  —  exp(— Xx),  the  superexpectation  function  is 

„  ,  ,  f  x  +  (1/A)  exp(— Xx)  for  x  >  0, 

Bx(;E)  =  il/A  for  x  <  0, 

and  the  conjugate  superexpectation  function  has  E*x{p )  =  (1/ A) (p  —  1)(1  —  log(l  —  p))  for 
p  £  [0, 1).  Quantiles  and  superquantiles  are  thus  given  on  (0, 1)  by 

Qxip)  =  — (l/A)  l°g(l  —  p)i  Qxip )  =  (1A)(1  -  l°g(l  ~P))- 


Our  results  further  make  available  new  estimates  for  work  with  super  quantiles. 

Theorem  3  (superquantile  estimates).  For  p  £  [0, 1),  one  has 

(a)  | Qx{p)  -  QY(p) I  <  ibpE\x  ~  Y\  when  E[  \X\  ]  <  oo,  E[ \Y\ }  <  oo. 

(b)  E[ X]  <  Qxip )  <  E[X\  +  ^=cr{X)  when  E[X 2]  <  oo,  cr(X)  =  standard  deviation. 

Proof.  Observe  first  that  |  max{i,  a}&max{x,  6}|  <  |a  —  b\.  For  the  superexpectation  functions 
corresponding  to  X  and  Y,  this  gives  us 

| Ex(x)  —  Ey{x)\  =  E[  |  max{x,X}  —  max{x,  y}|  ]  <  E[  \X  —  Y\  ]. 

or  in  other  words,  both  EY  <  Ex  +  E[  \X  —  Y \  ]  and  Ex  <  EY  +  E[  \X  —  Y\  ].  Applying  the 
Legendre-Fenchel  transform,  which  reverses  functional  inequalities,  we  see  that 


Ey  >  E*x  -  E[  \X  -  Y\  ],  E*x  >  E*y  -  E[  \X  -  Y\], 


and  consequently  \Ex(p)  —  EY(jp)\  <  E[  \X  —  Y\  ]  for  p  £  [0, 1).  Then  (3.18)  yields  (a). 

For  (b),  we  note  that  — Ex(p )  =  Jp  Qx(p')dp'  =  fc )  Qx{p')I\p,i\{p')dp'  for  the  characteristic 
function  of  the  interval  [1  ,p],  while  recalling  that  E[X]  =  Then,  by  way  of  (3.18), 

we  have  0  <  (1  —  p)(Qx(p)  ~  ElE\)  =  Jo  (Qx(p')  —  E[X])I^Ptpip')dp' .  The  Cauchy-Schwartz 
inequality  provides  now  that 


(1  -p)(Qx(p)-E[X})  <  [  /  {Qx{p'W)-E[X])2dp' 


d  V2 


LJ  o 


I]p  piv'fdp' 


d  V2 


where  the  first  factor  on  the  right  is  (E[  \X  —  E[X]  |2])1,/2  =  &(X)  by  (3.3)  and  the  second  is 
\/l  —  p.  In  dividing  through  by  1  —  p,  one  gets  the  upper  bound  in  (b).  □ 

Convergence  in  distribution.  Convergence  of  a  sequence  of  random  variables  Xk  to  a 
random  variable  X  can  now  be  brought  into  focus.  There  are  several  concepts  of  importance, 
but  the  one  we  concentrate  on  is  convergence  in  distribution,  which  is  customarily  defined  by1' 


Xk  — >  X  in  distribution  when  FXk(x)  — >  Fx(x)  at  all  continuity  points  x  of  FX-  (3.23) 

17In  probability  theory,  random  variables  are  generally  presented  as  functions  on  a  probability  space.  The 
convergence  of  a  sequence  is  viewed  then  with  all  the  random  variables  regarded  as  functions  on  the  same 
probability  space.  Here  we  are  working  directly  with  distributions  and  only  nominally  with  particular  random 
variables  giving  rise  to  them,  which  are  not  unique.  However,  Skorohod’s  theorem  reconciles  these  points  of 
view;  cf.  [3,  Theorem  25.6].  It  says  that  when  distribution  functions  Fk  converge  to  a  distribution  function  F 
in  the  manner  of  (3.23),  it  is  possible  to  construct  random  variables  Xk  and  X  on  a  common  probability  space 
such  that  Fk  =  FXk ,  F  =  Fx,  and  the  functions  Xk  converge  pointwise  to  X. 
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This  classical  property  has  various  characterizations,  for  instance 


Xk  — >  X  in  distribution  E[g(Xk)]  — >  E[g(X)]  for  bounded  continuous  g,  (3.24) 


which  is  recorded  in  Billingsley  [3,  Theorem  25.8].  Here  we  provide  characterizations  beyond 
such  classical  theory. 

Theorem  4  (characterizations  of  convergence  in  distribution).  For  a  sequence  of  random  vari¬ 
ables  Xk,  convergence  in  distribution  to  a  random  variable  X  is  equivalent  also  to  each  of  the 
following  conditions: 

(a)  rX/c  converges  graphically  to  Tx, 

(b)  AXk  converges  graphically  to  Ax, 

(c)  Qxk(p )  — »  Qx(p )  at  all  continuity  points  p  of  Qx  in  (0, 1), 

(d)  EXk(x )  Ex(x)  for  all  x  G  (— oo,  oo), 

(e)  QXk  (P)  ->  Qx(p)  for  all  p  €  (0, 1). 

Proof.  The  equivalence  with  (a)  is  evident  from  the  description  of  graphical  convergence  in 
(2.25).  The  equivalence  with  (b)  then  follows  because  graphical  convergence  is  preserved  when 
taking  inverses.  Application  of  (2.25)  to  the  convergence  in  (b)  gives  the  equivalence  with  (c). 

When  the  defining  property  in  (3.23)  holds,  the  integrals  /“[l  —  FXk(x')\dx'  converge  to 
/“[l —Fx\(x')\dx'  (inasmuch  as  the  integrands  are  uniformly  bounded).  This  yields  the  property 
in  (d)  through  the  fact  that  Ex(x)  =  Hx(x)  +  x  for  the  function  Hx  in  (3.11).  For  the  opposite 
implication,  if  (d)  holds  we  can  use  derivative  estimates  for  convex  functions  in  the  form 


Ex(x)  -  Ex(x  -  e) 


<  E'x(x)  <  E'x+(x )  < 


Ex(x  +  e)  -  Ex(x 


for  any  e  >  0  and  in  parallel 

EXk(x)  -  EXk(x  -  e) 


<  E'Xk(x)  <  E'+k(x)  < 


EXk(x  +  e)  -EXk{x) 


(3.25) 


(3.26) 


where  E'x  (x)  —  F\k(x)  and  E'x(x)  —  Fx(x).  At  a  continuity  point  x  of  Fx  we  also  have 
E’x(x)  =  Fx(x).  Since  the  outer  bounds  in  (3.25)  approach  those  in  (3.26)  as  k  — *  oo  by  (d), 
we  conclude  that 


Ex(x)  -  EX(X  -  f)  s  limmffYiW  <  limsupFxk(x)  <  Me ±X  PM 


(3.27) 


The  upper  and  lower  bounds  in  (3.27)  both  converge  to  E'x(x)  =  Fx(x)  at  the  continuity  point 
x,  and  therefore  FXk(x)  Fx(x).  Thus,  (d)  is  equivalent  to  the  defining  property  in  (3.23)  for 
convergence  in  distribution. 

Next  we  observe  that,  since  (d)  concerns  finite  convex  functions,  the  pointwise  convergence 
there  is  equivalent  to  the  epi-convergence  of  EXk  to  Ex\  recall  (2.24).  Applying  (2.23)  we  get 
the  epi-convergence  of  the  conjugate  functions  EXk  to  Ex,  and  then  the  equivalence  with  (e), 
once  more  via  (2.24).  □ 

By  taking  advantage  of  (2.24),  the  everywhere  pointwise  convergence  in  (d)  and  (e)  can 
be  replaced  by  pointwise  convergence  on  a  dense  subset  or  uniform  convergence  on  compact 
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intervals  of  (— 00,00)  and  (0,1),  respectively.  On  the  other  hand,  the  pointwise  convergence 
property  in  (c)  can  be  elaborated  in  terms  of  the  alternative  descriptions  in  (2.25)  and  (2.26). 

ft  is  apparent  from  (e)  that  a  superquantile  is  stable  under  perturbations  of  the  under¬ 
lying  probability  distribution.  This  has  importance  consequences  for  optimization  problems 
with  super  quantiles  of  parametric  random  variables  as  objective  functions  and  constraints.  If 
the  superquantiles  remain  convex  and  finite  as  functions  of  the  parameters,  then  Theorem  4 
with  (2.24)  ensures  epiconvergence  of  approximations  obtained  by  replacing  true  probability 
distributions  with  approximating  ones.  Moreover,  optimal  solutions  of  problems  with  approxi¬ 
mations  will  tend  to  those  of  the  true  problems,  justifying  the  use  of  approximate  probability 
distributions  in  applications. 

Other  useful  implications  of  convergence  in  distribution,  which  relax  the  boundedness  of  g 
in  (3.24),  can  be  derived  from  conditions  on  moments.  Let  us  say,  for  r  >  1,  that  a  continuous 
function  g  :  (—00,00)  — >  (—00,00)  has  growth  rate  at  most  r  when  lim^i^oo  |(yf(a;)|/|x|T'  <  00. 
This  is  equivalent  to  having  c  >  0  such  that  \g(x)\  <  c(  1  +  |a:|r)  everywhere. 

Theorem  5  (further  properties  of  convergence  in  distribution).  If  Xk  converges  in  distribution 
to  X  and  limsupfe  E[  |Abj|d1+e)  ]  <  00  for  some  r  >  1  and  e  >  0,  then 

E[g(Xk)  ]  — >  E[g(X)  ]  (finite)  for  continuous  g  having  growth  rate  at  most  r.  (3.28) 

Proof.  Consider  Yk (p)  =  g(Qxk(p ))  and  Y(p)  =  g(Qx(p ))  as  random  variables  on  the  prob¬ 
ability  space  (0,1).  We  have  E[Yk]  =  E[g(Xk)\,  E[Y ]  =  E[g(X)],  by  (3.3)  and  know  from 
Theorem  4  that  the  convergence  in  distribution  of  Xk  to  X  entails  Yk  as  a  function  on  (0, 1) 
converging  pointwise  to  Y  almost  everywhere.  Our  aim  is  to  show  that  the  growth  assumptions 
imply  E[Yk]  E[Y ]  with  E(Y]  finite.  For  that  it  suffices  to  confirm  that  those  assumptions 
guarantee  uniform  integrability  of  the  functions  Yk  in  the  sense  that 

lim  sup  /  \Yk(p)\dp  =  0,  (3.29) 

a^°°  k  J\Yk\>a 

see  Billingsley  [3,  Theorem  25.12],  Because  g  has  growth  rate  <  r,  there  exists  c  >  0  such  that 
| Yk (p)  <  c(l  +  \Qxk(p)\r)  for  all  k.  It  will  be  enough  therefore  to  confirm  that 

lim  sup  [  Zk(p)dp  —  0  for  Zk(p)  =  \Qxk(p)\r ■  (3.30) 

o— >00  k  Jzk>b 

We  have  (via  Billingsley  [3,  (25.13)])  the  estimate  for  any  e  >  0  that 

/  zk(p)d.p  <  1  f1  zg(p)dP  =  1  Qx,(p)r(1+,Vp  =  (e\ iwr<‘+')]. 

Jzk>b  be  Jo  be  Jo  be 

Under  our  assumption  that  the  expectations  E[  |Xfc|r+e]  are  uniformly  bounded  from  above  (for 
k  sufficiently  large),  we  obtain  (3.30)  and  the  desired  uniform  integrability.  □ 

As  a  particular  case  of  Theorem  5  one  can  take  g(x)  =  |a;|r  in  (3.28)  to  get  convergence  of 
moments:  UflAAn  — >  U[|A|r],  Note  that  even  E[Xk\  — >  E[ X]  is  not  assured  by  convergence 
in  distribution  of  Xk  to  A"  without  something  extra,  despite  having  QXk(P )  ~ ^  Qx(p)  almost 
everywhere  with  E[Xk]  =  inf  Qxk  and  E[ X]  =  inf  Qx.  Here  the  sufficient  condition  given  for 
E[Xk]  — >  E[X]  is  the  boundedness  of  E[  |AA|1+e]  as  k  — >  00  for  some  e  >  0. 


23 


Distribution  densities  and  quantile  densities.  The  symmetric  view  of  second  deriva¬ 
tives  of  convex  functions  and  their  conjugates,  built  at  the  end  of  Section  3  around  the  maximal 
monotone  relations  associated  with  them,  will  now  be  applied  to  random  variables. 

If  a  distribution  function  Fx  is  differentiable,  its  derivative  F'x  gives  the  distribution  density 
function  fx  for  X.  Then18 

/OO  POO 

g(x)dFx(x)  =  /  g(x)fx(x)dx.  (3.31) 

-OO  J  —  OO 

What  is  new  now  is  the  perspective  from  Theorem  1  that  fx(x)  is  the  second  derivative  Ex(x), 
and  that  a  sort  of  duality  lies  in  the  background. 

The  measure  dTx  =  dFx  has  a  counterpart  dAx  =  dQx,  the  Lebesgue-Stieltjes  measure 
associated  with  the  quantile  function  Qx  as  a  nondecreasing  left-continuous  function  on  (0, 1). 
We  can  equally  contemplate  the  differentiability  of  Qx,  interpreting  it  as  yielding  a  quan¬ 
tile  density  function  qx  on  (0, 1),  with  qx(p)  being  the  second  derivative  Ex  (p)  according  to 
Theorem  2.  Then19 

f  h(p)dQx(p )  =  [  h(p)qx(p)dp.  (3.32) 

Jo  Jo 

It  is  interesting  in  this  respect  to  note  that,  through  change  of  variables,20  one  has 

P 1  POO 

/  h(p)dQx(p )  =  /  h(Fx(x))dx.  (3.33) 

J  0  J  —oo 

This  is  the  quantile  version  of  the  distribution  rule  in  the  equivalence  between  (3.2)  and  (3.3). 

Full  differentiability  of  Fx  and  Qx  is  not  a  prerequisite  to  all  insights.  The  available  facts 
can  be  specialized  without  that,  although  full  differentiability  does  produce  the  nicest  picture. 

Theorem  6  (duality  of  densities).  The  following  relations  hold  in  general: 


F x  has  derivative  f;x(x)  >  o 

(x,p)  is  a  nonsingular  point  ofFx 
<=>■  (p,  x)  is  a  nonsingular  point  of  Ax  =  T”1 


X  ; 


Qx  has  derivative  Q'x(p )  >  0 

in  which  case  the  derivatives  are  reciprocal,  Q'x(p )  =  1  / Fx(x). 


In  consequence, 


Fx  is  differentiable  on  (—oo,  oo )  with  F'x  (x)  >  0  for  x  G  (inf  X,  sup  X), 
Qx  is  differentiable  on  (0, 1)  with  Q'x (p)  >  0  for  p  G  (0, 1), 


(3.34) 


(3.35) 


in  which  case 

Q'xiP )  =  1  /F'x(Qx(p))  for  p  G  (0, 1), 

F'x (x)  =  l/Q'x(Fx(x ))  for  x  G  (inf  X,  sup  X). 


(3.36) 


Proof.  All  of  this  is  immediate  from  (2.27)  with  F'x  and  Q'x  being  the  second  derivatives  of 
the  convex  functions  Ex  and  E*x.  □ 


18For  measurable  functions  g  that  are  integrable  with  respect  to  the  dFx  measure. 

19For  measurable  functions  h  that  are  integrable  with  respect  to  the  dQx  measure. 

20 Again  applying  the  rule  in  Billingsley  [3,  Theorem  16.13];  the  dQx  measure  on  (0,1)  is  the  one  induced 
from  the  dx  measure  on  (—00,00)  by  the  function  FX- 
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4  Applications  to  Quantifying  Risk 


The  importance  of  Qx  and  Qx  as  so-called  measures  of  risk  has  been  recalled  in  Section  1,  but 
more  can  be  said  with  the  facts  now  at  our  disposal.  An  explanation  of  the  joint  minimization 
formula  (1.11)  for  Qx(p )  and  Qx(p )  will  be  taken  up  first.  An  extension  to  a  parallel  formula, 
in  which  Qx(p)  gives  the  argmin  instead  of  the  min,  will  follow. 

Derivation  of  the  joint  rule  for  quantiles  and  super  quantiles.  The  proof  of  Theo¬ 
rem  2  took  shortcuts  by  utilizing  facts  in  convex  analysis,  but  a  direct  approach  to  calculating 
Ex  from  Ex  by  the  Legendre-Fenchcl  transform  was  the  original  route  to  discovery  of  the 
minimization  formula  (1.11).  The  motivation  in  the  first  place  was  to  obtain  a  minimization 
formula  for  quantiles  based  on  knowing  that  Tx  is  the  graph  of  d Ex,  namely 

[Qx(p),  Qx(p)}  =  dExl (P)  =  dE'x(P)  =  argminx{  Ex(x)  -  xp  }  for  any  p  G  (0, 1).  (4.1) 


by  (2.17)  and  (2.18).  Here 


Ex(x )  —  px  =  (1  —  p)x  +  (i7[niax{a:,  X}]  —  x) 


(1  —  p)(x  -\ - —  E[max{0, X  —  a;}]), 


P 


and  consequently  [ Qx(p ),  QQip)]  is  the  set  of  x’s  that  minimize  x  +  A^E'[niax{0,  X  —  a;}].  The 
argmin  part  of  (1.11)  is  just  this.  At  the  same  time  we  see  that  the  Legendre-Fenchcl  formula 
E*x{p)  =  sup,,, {pa;  —  Ex(x)  },  with  attainment  guaranteed  for  p  G  (0, 1),  translates  to 


Exip) 

1  —  p 


min 


- E  [  max{  0 ,  X 

1  —  p 


for  p  G  (0, 1). 


The  left  side  is  Qx(p)  by  Theorem  2,  and  this  clinches  the  other  half  of  the  rule  in  (1.11). 

Extension  to  “higher-order  superquantiles.”  We  proceed  now  to  look  for  an  analog  of 
(1.11)  in  which  the  superquantiles  take  the  place  of  quantiles  in  giving  the  minimum.  The  reason 
for  wanting  to  do  this  is  the  role  of  quantiles,  and  potentially  superquantiles,  in  generalized 
regression  of  the  kind  considered  in  [24]  and  [25],  but  explaining  all  that  here  would  carry  us 
far  away  from  the  current  theme.  “Superquantile  regression1'  is  the  subject  introduced  and 
developed  in  our  paper  [21],  with  support  from  results  secured  here. 

An  observation  to  start  from  is  that  the  main  term  E[max{0,  X}]  in  (1.11)  has  the  additional 
expressions 

/OO  p  1 

max{0, x}dFx(a;)  =  /  max{0 ,Qx{p)}dp-  (4.2) 

-oo  J  0 


It  turns  out  that  all  we  need  to  do  in  order  to  build  the  right  analog  of  (1.11)  is  to  replace  Fx 
by  a  different  but  closely  related  distribution  function  Fx  such  that 


the  quantiles  of  Fx  are  the  superquantiles  of  Fx  ■ 


(4.3) 


As  indicated  graphically  in  Section  1,  this  superdistribution  function  is  obtained  by  “inverting” 
Qx,  namely 

_  f  Qx V)  for  lirnP\0 Qxi'Iff  <  x  <  lim pSiQxip), 

Fx(x)  =  <  0  for  x  <  limp \, 0  Qx (p) ,  (4-4) 

[  1  for  x  >  linip/q  Qx(p)- 
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It  is  the  distribution  function  for  the  random  variable  X  associated  with  X  by  (1.7),  so  that 

Qxip)  =  Qx(p)- 

Much  that  has  already  been  worked  out  for  Fx  carries  over  to  Fx,  as  long  as  ff[|X|]  <  oo 
in  accordance  with  the  blanket  assumption  in  (3.1)  that  we  have  been  relying  on.  That  is  the 
case  when  if  [A"2]  <  oo,  as  seen  through  the  estimate  in  Theorem  3(b).  In  particular,  then, 

_  _ ,  _  /■oo  _  rl  _ 

Fx  —  Ex  for  Ex(x)  —  /  ma x{x,x'}dFx{x')  =  /  ma x{x,Qx(p)}dp,  (4.5) 

where  the  equivalence  holds  as  an  echo  of  (3.12)  in  the  face  of  (4.3).  The  function  Ex  is  finite 
and  convex  on  (— oo,  oo),  again  with  Ex{x)  —  x  positive  and  tending  to  0  as  x  — *  oo. 

The  conjugate  function  Ex  can  be  determined  by  applying  Theorem  2  in  this  setting.  The 
main  ingredient  in  the  resulting  formula  is  the  replacement  of  Qx(p )  by  a  higher  analog,  namely 

Qxip)  =  j -  /-  x'dFx(x')  =  - -  /  Qx(p')dp'.  (4.6) 

1  -pJQxip)  1  ~PJp 

This  “supersuperquantile”  is  the  conditional  expectation  of  X  in  its  p-tail  with  respect  to  the 
Ex  =  Fx  distribution.  The  complications  with  the  original  definition  of  the  p-tail  fall  away 
because  Fx  has  no  jumps;  the  Fx  distribution  has  no  “probability  atoms.”  With  respect  to 
Fx,  the  interval  [Qx(p),  oo)  has  probability  1  —  p.  As  a  matter  of  fact,  Qx  =  Qy- 

This  suggests,  through  (4.2),  that  the  analog  of  the  expression  Vp  in  (1.11)  as  a  “measure 
of  regret”  might  be  taken  to  be 

VP(X)  =  - -  /  max{0,  x}dF x(x)  =  - -  /  max{0,  Qx(p')}dp',  (4.7) 

1  —  p  J — oo  1  —  p  Jo 

and  this  does  indeed  give  us  what  we  want. 

Theorem  7  (superquantiles  as  quantiles).  Suppose  E[  |A^|2]  <  oo.  Then,  as  a  measure  of  risk, 
F(X)  =  Qx(p)  has  the  coherency  properties  in  (1.10),  like  1Z(X)  =  Qx(p)-  In  terms  of  Vp 
defined  in  (4.7),  the  two  can  be  calculated  simultaneously  for  p  G  (0, 1)  by 

Qx (P)  =  argminx {x  +  VP(X  —  x)  }, 

Qxip)  =  min*{  x  +  VP(X  -  x)  }. 


The  functional  Vp  itself  has  the  regret  properties  ofVp  in  (1.12),  specifically 
VP(X)  <  VP(X')  when  X  <  X'  almost  surely, 

Vp(x  +  x')^Vp(x)  +  Vp{x'), 

Vp( XX)  =  AV(X)  for  A  >  0, 

VP(X)  >  E[X\,  with  equality  holding  only  when  X  =  0. 


Proof.  The  parallel  structure  suffices  to  confirm  (4.8).  The  coherency  properties  of  IZ(X)  = 
Qxip)  in  (1.10)  with  respect  to  X  lead  through  the  second  integral  expression  in  (4.6)  to 
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those  same  properties  holding  for  TZ(X)  =  Qx(p).  The  properties  in  (4.9)  similarly  come  from 
invoking  (1.10)  for  Qx(p )  in  the  second  formula  for  VP(X)  in  (4.7)  and  calling  on  the  fact  that 
Qx(p)  is  an  increasing  function  with  E[X]  as  its  inhmum.  □ 

The  minimization  in  (4.8)  may  seem  to  demand  too  much  knowledge  of  the  regret  func¬ 
tional  Vp  be  practical,  but  properties  of  the  superquantile  integrand,  such  as  the  estimates  in 
Theorem  3,  can  come  to  the  rescue.  The  elementary  theory  of  integration  (approximation  of  in¬ 
tegrands  by  step  functions  or  piecewise  linear  functions)  leads  to  approximating  expressions  for 
VP(X)  that  come  from  linear  combinations  of  superquantiles  QxiPk)-  The  formula  for  Qx(p) 
in  (1.11)  for  any  p  can  be  employed  to  calculate  the  value  of  such  an  expression  for  any  X. 
Upper  and  lower  estimates  can  be  developed  for  the  closeness  of  such  an  expression  to  VP(X). 
Such  estimates  are  worked  out  in  our  paper  [21]. 

Stochastic  dominance.  Another  notion  that  enters  the  study  of  risk  is  stochastic  dom¬ 
inance.  Two  versions,  known  as  first-order  and  second-order,  are  especially  important,  but 
the  issue  of  “usage  orientation”  of  a  random  variable  again  has  to  be  respected.  Most  often, 
stochastic  dominance  is  articulated  for  the  context  of  a  random  variable  X  being  preferable  to 
a  random  variable  Y  when  its  outcomes  are,  by  some  quantification  standard,  generally  higher. 
That  is  profit/gain  orientation,  but  in  this  article  we  are  treating  cost /loss/damage  orientation, 
so  some  inequalities  need  to  be  reversed  in  identifying  the  “dominance”  of  X  over  Y  with  X 
being  “better”  then  Y. 

In  profit /gain/benefit  orientation,  it  is  customary  to  define  first-order  stochastic  dominance 
X  >i  Y  as  corresponding  to  Fx  <  Fy  (the  graph  of  Fx  therefore  being  to  the  right  of  the 
graph  of  FY).  Second-order  stochastic  stochastic  dominance  X  >2  Y  is  taken  as  F^  <  Fy  ^ 
cf.  (3.10).  It  is  well  known  that  these  properties  translate  into  having  E\g(X)]  >  E[g(Y)\  for 
a  class  of  increasing  functions  in  the  first  case  and  a  class  of  increasing  concave  functions  in 
the  second.  However,  some  authors  prefer  to  take  such  expectation  properties  directly  as  the 
definition,  since  they  provide  the  main  motivation  for  the  concept  in  applications.  We  follow 
that  pattern  here  in  adapting  to  cost /loss/ damage  orientation. 

Definition  (first-  and  second-order  dominance,  inverted).  First-order  stochastic  dominance  of 
X  over  Y  in  cost/loss/damage  orientation,  to  be  denoted  by  X  <[  Y  here,21  and  second-order 
stochastic  dominance,  X  <'2  Y ,  mean  the  following: 

X  <\  Y  E[g(X)\  <  E[g(Y)]  for  continuous  bounded  increasing  g,  .  . 

X  <2  Y  E[g(X)]  <  E[g{Y)\  for  finite  convex  increasing  g.  ' 

Recall  here  that  a  finite  convex  function  g  is  automatically  continuous.  Also,  it  always  has 
g(x)  >  ax  +  b  for  some  a  >  0  and  b  e  (— oo,oo),  so  that  the  expectations  in  (4.12)  are  sure 
to  be  well  defined,  although  possibly  oo,  but  not  —  oo  (under  our  blanket  assumption  (3.1)  on 
finite  expectations). 

If  g  is  interpreted  as  a  penalty  function,  the  inequalities  in  (4.12)  concern  expected  penalties 
under  X  and  Y.  The  two  conditions  then  describe  situations  involving  a  pair  of  cost/loss 
random  variables  X  and  Y  in  which  X  is  less  risky  than  Y  regardless  of  the  particular  penalty 

21The  prime  is  a  reminder  of  the  switch  from  the  usual  orientation  as  seen  in  textbooks. 
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function  g  that  may  have  to  be  faced — within  some  category.22  This  is  attractive  in  situations 
where  a  decision  maker  may  have  little  knowledge  of  the  penalties.  An  important  example 
for  stochastic  dominance  in  the  profit/gain  orientation  comes  up  in  finance,  where  penalty 
functions  are  replaced  by  utility  functions  and  convexity  in  the  second-order  case  by  concavity. 

The  second  property  in  (4.12)  is  also  known  as  “increasing  convex  order,”  <ic,  cf.  [12],  and 
was  featured  by  Dentcheva  and  Martinez  [4]  in  their  adaptation  to  cost/loss  orientation. 

Theorem  8  (stochastic  dominance  in  cost /loss/ damage  orientation).  First-order  stochastic 
dominance  is  characterized  by 

X  <\  Y  «  Fx>Fy  «  Qx<Qy-  (4.13) 

Second-order  stochastic  dominance  is  characterized  by 

X<'2  Y  «  Ex<Ey  «  Qx  <  Qy.  (4.14) 

Proof.  We  rely  here,  in  part,  on  characterizations  in  the  profit/gain  orientation  furnished 
by  Follmer  and  Schied  [7]  (and  elsewhere).  Their  Theorem  2.70  covers  (4.13)  with  a  slight 
difference  coming  from  our  focus  on  the  left-continuous  quantile  function.  They  contemplate  a 
class  of  “quantile  functions”  between  these  and  their  right-continuous  partners,  and  accordingly 
replace  the  pointwise  inequality  Qx  <  Qy  by  an  almost  everywhere  inequality. 

For  (4.14)  the  derivation  is  a  bit  more  complicated  and  specialized  toward  the  concepts 
in  this  article.  The  cost/loss  version  of  one  characterization  of  second-order  dominance  in 
Theorem  2.70  of  [7]  is  that 

X  <2  Y  F[max{0,  X  —  c}]  <  F[max{0,F  —  c}]  for  all  c.  (4.15) 

Because  Ex(c)  =  F[max{c,  AT }]  =  c  +  F[max{0,  X  —  c}]  and  similarly  Ey  (c),  we  can  translate 

(4.15)  to  saying  that  Fy(c)  <  Ey(c)  for  all  c.  The  observation  to  make  next  is  that  the 

Legendre-Fenchel  transform  converts  Ex  <  Ey  to  E*x  >  Ey.  The  formula  in  Theorem  2  lets 
us  identify  this  with  Qx  <  QY.  □ 

Stochastic  dominance  has  important  applications  to  constraint  modeling  in  stochastic  opti¬ 
mization;  see  Dentcheva  and  Ruszczynski  [5]. 

Comonotonicity.  Another  way  that  monotone  relations  enter  the  framework  of  risk  is 
through  the  property  of  comonotonicity. 

Definition  (comonotonicity  of  random  variables).  Two  random  variables  X \  and  X2  are  said 
to  be  comonotone  if  the  essential  range  of  outcomes  of  the  pair  (Ah,  X2)  is  a  monotone  relation 
T  in  M  x  I?.23 

This  means  roughly  that  the  two  random  variables  move  in  tandem;  the  risk  in  one  cannot 
hedge  against  the  risk  in  the  other.  Indeed,  it  implies  the  existence  of  a  third  random  variable 
X  along  with  increasing  Lipschitz  continuous  functions  fi  and  f2  such  that  Ah  =  f\  (A)  and 

22This  explains  why  we  use  <  in  (4.12).  It  signals  “less  risky.” 

23The  essential  range  is  the  smallest  closed  set  that,  with  probability  1,  contains  all  outcomes. 


X2  =  /2( X).  For  this,  one  can  simply  take  X  =  Ad  +  Ad  and  apply  the  Minty  parameterization 
of  a  maximal  extension  of  T;  cf.  (2.3). 

Besides  the  motivation  for  comotonicity  as  capturing  this  tandem  behavior  of  a  pair  of 
random  variables,  there  are  consequences  for  their  quantiles  and  superquantiles.  The  fact  that 
comonotonicity  of  random  variables  leads  to  additivity  of  their  quantiles,  the  initial  property 
below,  is  well  known;  cf.  [7,  Lemma  4.84],  We  offer  an  argument  for  the  converse  and  indicate 
how  this  ties  in  with  superquantiles  and  superexpectations. 

Theorem  9  (characterizations  of  comonotonicity).  The  following  properties  of  a  pair  of  ran¬ 
dom  variables  X1  and  X2  are  equivalent  to  comonotonicity: 

(a)  Qx1+x2(p)  =  QxAp)  +  Qx2 (p)  for  all  p  E  (0, 1), 

(b)  Qx1+x2(P )  =  QxM  +  QxJp)  for  all  p  E  (0, 1), 

(c)  EXl+xJx)  =  min  {  ExAxi)  +  ExJx 2)  }  for  all  x  E  (-00,  00). 

Xl+X2=x 

Proof.  First  we  suppose  comonotonicity  and  show  that  then  (a)  holds.  The  monotonicity 
of  the  essential  range  V  of  (Ad,  Ad)  makes  the  function  (f)  :  (aq,a;2)  — *  x\  +  x2  =  x  map 
T  monotonically  one-to-one  into  the  real  line.  The  joint  probability  distribution  of  X\  and 
Ad  on  ]R  x  JR,  concentrated  in  T,  is  thereby  transformed  into  the  probability  distribution  of 
X  =  Xi  +  A2,  concentrated  in  0(T).  For  any  p  E  (0, 1)  the  quantile  Qx(p )  gives  the  highest 
point  x  of  0(r)  such  that  Fx(x)  <  p.  The  unique  antecedent  0_1(a;)  =  (xi(x),  x2(x))  E  T  then 
has  to  be  (Qx^p),  Qx2(p))-  Thus,  Qx(p)  =  Qx^p)  +  Qx2(p ),  as  claimed. 

To  demonstrate  the  converse,  that  (a)  implies  comonotonicity,  we  can  make  use  of  the  fact 
that  the  essential  range  of  a  random  variable  A"  is  the  closure  of  the  range  of  its  quantile 
function  Qx.  It  is  traced  by  Qx(p)  as  p  goes  from  0  to  1  in  (0, 1),  except  that  where  jumps 
occur  the  right  limit  Qx(p)  needs  also  to  be  brought  in.  This  can  be  invoked  for  Ad,  A"2  and 
X  =  Ad  +  Al2  to  see  that,  when  (a)  holds,  the  probability  parameter  p  traces  the  range  of 
(Ad,  Ad)  monotonically  as  (QXl(p),  Qx2(p))-  This  range  is  then  a  monotone  relation. 

The  equivalence  between  (a)  and  (b)  is  obvious  from  the  formula  (1.6)  for  superquantiles  in 
terms  of  quantiles.  This  yields  a  further  equivalence  through  Theorem  2  with  having 

e*x,+x2(p)  =  ExM  +  E*x2(P )  for  a]1  P- 

Applying  the  rule  in  convex  analysis  that  the  conjugate  of  a  sum  is  obtained  by  the  operation 
#  of  “inf-convolution”  on  the  conjugate  functions,24 

( E-Xl  +  Egr(x)  =  (E-xyE»)(x)  =  Jnfj  E"(Xl)  +  E*yXl)  }. 

we  arrive  at  (c).  □ 

Theorem  9  relates  also  to  an  associated  concept  of  comonotonicity  for  measures  of  risk  due 
to  Ogryczak  and  Ruszczynski  [14],  [15],  [16],  namely  that 

TZ  is  comonotonic  if  77(Ad  +  Ad)  =  TZ(Xi)  +  TZ(X2)  when  Ad,  Ad,  are  comonotone. 

(4.16) 

24For  more  on  inf-convolution,  see  [17]  and  [26]. 
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The  theorem  says,  among  other  things,  that 


the  risk  measure  7l(X)  =  Qx(p )  is  comonotonic  for  every  p  G  (0, 1).  (4-17) 

It  is  easy  to  see  that  this  carries  over  also  to  the  mixed  superquantile  measures  of  risk  in 
(4.10)  and  (4.11).  More  on  this  topic  can  be  found  in  the  book  of  Follmer  and  Schied  [7]  in 
coordination  with  applications  in  finance. 
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